AITopics | Supervised Learning

Collaborating Authors

Supervised Learning

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Training Infinitely Deep and Wide Transformers

Barboni, Raphaël, de Hoop, Maarten V., Furuya, Takashi, Peyré, Gabriel

arXiv.org Machine LearningMay-19-2026

Transformers have become the dominant architecture in modern machine learning, yet the theoretical understanding of their training dynamics remains limited. This paper develops a rigorous mathematical framework for analyzing gradient-based training of transformers in the mean-field regime, where both the depth (number of layers) and width (number of attention heads) tend to infinity. While ResNet training can be understood as controlling a neural ODE, transformer training corresponds to controlling a neural PDE, due to the coupling of multiple token distributions through the attention mechanism. Our mean-field model features two types of measure representations: token distributions evolving through layers and attention parameters at each layer. We establish well-posedness of the forward pass through infinitely deep transformers, characterizing token evolution via flow maps that satisfy ODEs in function spaces. Using adjoint sensitivity analysis, we derive an explicit formula for the conditional Wasserstein gradient of the training risk, involving adjoint variables governed by backward ODEs. We prove the existence and uniqueness of gradient flow curves in the conditional Wasserstein metric space, establishing a rigorous foundation for gradient-based transformer training. A key technical contribution is providing necessary and sufficient conditions for injectivity of the Neural Tangent Kernel (NTK) for attention mechanisms: we show that NTK injectivity is equivalent to linear independence of log-sum-exp functions modulo affine functions, a condition satisfied by diverse token distributions, including discrete distributions, uniform distributions, and Gaussian mixtures. Under this NTK injectivity assumption, we prove that gradient flow converges to global minima when the initial loss is sufficiently small, eliminating spurious local minima from the optimization landscape.

arXiv.org Machine Learning

2605.1766

Country: Europe (0.45)

Genre: Research Report (0.81)

Industry: Energy > Oil & Gas > Upstream (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Random-Effects Algorithm for Random Objects in Metric Spaces

Matabuena, Marcos, Cámara, Mateo

arXiv.org Machine LearningMay-5-2026

Across many scientific disciplines, multiple observations are collected from the same experimental units, and in modern datasets these observations often arise as non-Euclidean random objects. In such settings, the incorporation of random effects is a critical modeling step for efficient estimation and personalized prediction. Although mixed-effects models are well established for scalar outcomes and, more recently, for functional data in Hilbert spaces, general random-effects frameworks for objects in metric spaces remain underdeveloped. In this paper, we propose a nonlinear Fréchet-based algorithm for random-effects modeling of arbitrary random objects defined on a metric space. Using M-estimation theory, we establish conditions under which the proposed metric-space prediction target is consistently estimated under a working random-effects formulation. We then evaluate the empirical performance of the proposed method using both synthetic data and digital health datasets that require practical tools for analyzing random objects in metric spaces, such as multivariate probability distributions and random graphs. We show that, although our method is developed beyond Hilbert spaces, it can outperform existing Hilbert space-based methods.

artificial intelligence, machine learning, random effect, (17 more...)

arXiv.org Machine Learning

2605.02693

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.97)
Health & Medicine > Health Care Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (1.00)

Add feedback

Language-based Action Concept Spaces Improve Video Self-Supervised Learning

Neural Information Processing SystemsApr-30-2026, 05:23:14 GMT

Recent contrastive language image pre-training has led to learning highly transferable and robust image representations. However, adapting these models to video domain with minimal supervision remains an open problem. We explore a simple step in that direction, using language tied self-supervised learning to adapt an image CLIP model to the video domain. A backbone modified for temporal modeling is trained under self-distillation settings with train objectives operating in an action concept space. Feature vectors of various action concepts extracted from a language encoder using relevant textual prompts construct this space. A large language model aware of actions and their attributes generates the relevant textual prompts. We introduce two train objectives, concept distillation and concept alignment, that retain generality of original representations while enforcing relations between actions and their attributes. Our approach improves zero-shot and linear probing performance on three action recognition benchmarks.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Strategic Classification under Unknown Personalized Manipulation Anonymous Author(s) Affiliation Address email

Neural Information Processing SystemsApr-27-2026, 13:45:49 GMT

We study the fundamental mistake bound and sample complexity in the strategic1 classification, where agents can strategically manipulate their feature vector up2 to an extent in order to be predicted as positive. For example, given a classifier3 determining college admission, student candidates may try to take easier classes to4 improve their GPA, retake SAT and change schools in an effort to fool the classifier.5 Ball manipulations are a widely studied class of manipulations in the literature,6 where agents can modify their feature vector within a bounded radius ball. Unlike7 most prior work, our work consider manipulations to be personalized, meaning8 that agents can have different levels of manipulation abilities (e.g., varying radii9 for ball manipulations), and unknown to the learner.10 We formalize the learning problem in an interaction model where the learner11 first deploys a classifier and the agent manipulates the feature vector within their12 manipulation set to game the deployed classifier. We investigate various scenarios13 in terms of the information available to the learner during the interaction, such14 as observing the original feature vector before or after deployment, observing the15 manipulated feature vector, or not seeing either the original or the manipulated16 feature vector. We begin by providing online mistake bounds and PAC sample17 complexity in these scenarios for ball manipulations. We also explore non-ball18 manipulations and show that, even in the simplest scenario where both the original19 and the manipulated feature vectors are revealed, the mistake bounds and sample20 complexity are lower bounded by Ω(|H|) when the target function belongs to a21 known class H.22

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Education > Educational Setting (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

543924fdf260ba990f2ef84f940f3db2-Paper-Conference.pdf

Neural Information Processing SystemsApr-27-2026, 13:45:47 GMT

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Education > Educational Setting (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.45)

Add feedback

First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces

Neural Information Processing SystemsApr-25-2026, 06:01:59 GMT

From optimal transport to robust dimensionality reduction, a plethora of machine learning applications can be cast into the min-max optimization problems over Riemannian manifolds. Though many min-max algorithms have been analyzed in the Euclidean setting, it has proved elusive to translate these results to the Riemannian case. Zhang et al. have recently shown that geodesic convex concave Riemannian problems always admit saddle-point solutions. Inspired by this result, we study whether a performance gap between Riemannian and optimal Euclidean space convex-concave algorithms is necessary. We answer this question in the negative--we prove that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically stronglyconvex-concave case, matching the Euclidean result. Our results also extend to the stochastic or non-smooth case where RCEG and Riemanian gradient ascent descent (RGDA) achieve near-optimal convergence rates up to factors depending on curvature of the manifold.

artificial intelligence, machine learning, manifold, (15 more...)

Neural Information Processing Systems

Country: Asia (0.15)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.40)

Add feedback

INDIGO: GNN-Based Inductive Knowledge Graph Completion Using Pair-Wise Encoding

Neural Information Processing SystemsApr-24-2026, 17:50:51 GMT

The aim of knowledge graph (KG) completion is to extend an incomplete KG with missing triples. Popular approaches based on graph embeddings typically work by first representing the KG in a vector space, and then applying a predefined scoring function to the resulting vectors to complete the KG. These approaches work well in transductive settings, where predicted triples involve only constants seen during training; however, they are not applicable in inductive settings, where the KG on which the model was trained is extended with new constants or merged with other KGs. The use of Graph Neural Networks (GNNs) has recently been proposed as a way to overcome these limitations; however, existing approaches do not fully exploit the capabilities of GNNs and still rely on heuristics and adhoc scoring functions. In this paper, we propose a novel approach, where the KG is fully encoded into a GNN in a transparent way, and where the predicted triples can be read out directly from the last layer of the GNN without the need for additional components or scoring functions. Our experiments show that our model outperforms state-of-the-art approaches on inductive KG completion benchmarks.

artificial intelligence, benchmark, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe (0.47)

Genre:

Research Report > Promising Solution (0.54)
Overview > Innovation (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.36)

Add feedback

A Hierarchical Sheaf Spectral Embedding Framework for Single-Cell RNA-seq Analysis

Wang, Xiang Xiang, We, Guo-Wei

arXiv.org Machine LearningMar-31-2026

Single-cell RNA-seq data analysis typically requires representations that capture heterogeneous local structure across multiple scales while remaining stable and interpretable. In this work, we propose a hierarchical sheaf spectral embedding (HSSE) framework that constructs informative cell-level features based on persistent sheaf Laplacian analysis. Starting from scale-dependent low-dimensional embeddings, we define cell-centered local neighborhoods at multiple resolutions. For each local neighborhood, we construct a data-driven cellular sheaf that encodes local relationships among cells. We then compute persistent sheaf Laplacians over sampled filtration intervals and extract spectral statistics that summarize the evolution of local relational structure across scales. These spectral descriptors are aggregated into a unified feature vector for each cell and can be directly used in downstream learning tasks without additional model training. We evaluate HSSE on twelve benchmark single-cell RNA-seq datasets covering diverse biological systems and data scales. Under a consistent classification protocol, HSSE achieves competitive or improved performance compared with existing multiscale and classical embedding-based methods across multiple evaluation metrics. The results demonstrate that sheaf spectral representations provide a robust and interpretable approach for single-cell RNA-seq data representation learning.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Machine Learning

2603.26858

Country:

North America > United States > Missouri > Greene County > Springfield (0.04)
North America > United States > Michigan > Ingham County > Lansing (0.04)
North America > United States > Michigan > Ingham County > East Lansing (0.04)
(2 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.35)

Add feedback

Improved Guarantees for Fully Dynamic k-Center Clustering with Outliers in General Metric Spaces

Neural Information Processing SystemsFeb-17-2026, 03:08:51 GMT

In the first case, points arrive sequentially, and only a summary can be stored in memory.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
Europe > Germany (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.41)

Add feedback

Geometry-Aware Adaptation for Pretrained Models

Neural Information Processing SystemsFeb-16-2026, 02:51:53 GMT

Machine learning models--including prominent zero-shot models--are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.50)

Add feedback